On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning

نویسندگان

  • Richard H. Byrd
  • Gillian M. Chin
  • Will Neveitt
  • Jorge Nocedal
چکیده

This paper describes how to incorporate sampled curvature information in a NewtonCG method and in a limited memory quasi-Newton method for statistical learning. The motivation for this work stems from supervised machine learning applications involving a very large number of training points. We follow a batch approach, also known in the stochastic optimization literature as a sample average approximation (SAA) approach. Curvature information is incorporated in two sub-sampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. A crucial feature of our technique is that Hessian-vector multiplications are carried out with a significantly smaller sample size than is used for the function and gradient. The efficiency of the proposed methods is illustrated using a machine learning application involving speech recognition. ∗Department of Computer Science, University of Colorado, Boulder, CO, USA. This author was supported by National Science Foundation grant CMMI 0728190 and Department of Energy grant DE-SC0001774. †Department of Industrial Engineering and Management Sciences, Northwestern University. This author was supported by an NSERC fellowship and a grant from Google Inc. ‡Google Research. §Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA. This author was supported by National Science Foundation grant DMS-0810213 and by Department of Energy grant DE-FG02-87ER25047-A004.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Use of Stochastic Hessian Information in Unconstrained Optimization

This paper describes how to incorporate stochastic curvature information in a NewtonCG method and in a limited memory quasi-Newton method for large scale optimization. The motivation for this work stems from statistical learning and stochastic optimization applications in which the objective function is the sum of a very large number of loss terms, and can be evaluated with a varying degree of ...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

The resurgence of deep learning, as a highly effective machine learning paradigm, has brought back to life the old optimization question of non-convexity. Indeed, the challenges related to the large-scale nature of many modern machine learning applications are severely exacerbated by the inherent non-convexity in the underlying models. In this light, efficient optimization algorithms which can ...

متن کامل

Exact and Inexact Subsampled Newton Methods for Optimization

The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of the paper analyzes an in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM Journal on Optimization

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2011